tolerance interval
Understanding the effect of varying amounts of replay per step
Paul, Animesh Kumar, Nema, Videh Raj
Model-based reinforcement learning uses models to plan, where the predictions and policies of an agent can be improved by using more computation without additional data from the environment, thereby improving sample efficiency. However, learning accurate estimates of the model is hard. Subsequently, the natural question is whether we can get similar benefits as planning with model-free methods. Experience replay is an essential component of many model-free algorithms enabling sample-efficient learning and stability by providing a mechanism to store past experiences for further reuse in the gradient computational process. Prior works have established connections between models and experience replay by planning with the latter. This involves increasing the number of times a mini-batch is sampled and used for updates at each step (amount of replay per step). We attempt to exploit this connection by doing a systematic study on the effect of varying amounts of replay per step in a well-known model-free algorithm: Deep Q-Network (DQN) in the Mountain Car environment. We empirically show that increasing replay improves DQN's sample efficiency, reduces the variation in its performance, and makes it more robust to change in hyperparameters. Altogether, this takes a step toward a better algorithm for deployment.
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
Approximate Tolerance and Prediction in Non-normal Models with Application to Clinical Trial Recruitment and End-of-study Success
A prediction interval covers a future observation from a random process in repeated sampling, and is typically constructed by identifying a pivotal quantity that is also an ancillary statistic. Outside of normality it can sometimes be challenging to identify an ancillary pivotal quantity without assuming some of the model parameters are known. A common solution is to identify an appropriate transformation of the data that yields normally distributed observations, or to treat model parameters as random variables and construct a Bayesian predictive distribution. Analogously, a tolerance interval covers a population percentile in repeated sampling and poses similar challenges outside of normality. The approach we consider leverages a link function that results in a pivotal quantity that is approximately normally distributed and produces tolerance and prediction intervals that work well for non-normal models where identifying an exact pivotal quantity may be intractable. This is the approach we explore when modeling recruitment interarrival time in clinical trials, and ultimately, time to complete recruitment.
Towards Robust Direct Perception Networks for Automated Driving
We consider the problem of engineering robust direct perception neural networks with output being regression. Such networks take high dimensional input image data, and they produce affordances such as the curvature of the upcoming road segment or the distance to the front vehicle. Our proposal starts by allowing a neural network prediction to deviate from the label with tolerance $\Delta$. The source of tolerance can be either contractual or from limiting factors where two entities may label the same data with slightly different numerical values. The tolerance motivates the use of a non-standard loss function where the loss is set to $0$ so long as the prediction-to-label distance is less than $\Delta$. We further extend the loss function and define a new provably robust criterion that is parametric to the allowed output tolerance $\Delta$, the layer index $\tilde{l}$ where perturbation is considered, and the maximum perturbation amount $\kappa$. During training, the robust loss is computed by first propagating symbolic errors from the $\tilde{l}$-th layer (with quantity bounded by $\kappa$) to the output layer, followed by computing the overflow between the error bounds and the allowed tolerance. The overall concept is experimented in engineering a direct perception neural network for understanding the central position of the ego-lane in pixel coordinates.
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology (0.83)
A Gentle Introduction to Statistical Tolerance Intervals in Machine Learning
It can be useful to have an upper and lower limit on data. These bounds can be used to help identify anomalies and set expectations for what to expect. A bound on observations from a population is called a tolerance interval. A tolerance interval is different from a prediction interval that quantifies the uncertainty for a single predicted value. It is also different from a confidence interval that quantifies the uncertainty of a population parameter such as a mean.
On Prediction and Tolerance Intervals for Dynamic Treatment Regimes
Lizotte, Daniel J., Tahmasebi, Arezoo
We develop and evaluate tolerance interval methods for dynamic treatment regimes (DTRs) that can provide more detailed prognostic information to patients who will follow an estimated optimal regime. Although the problem of constructing confidence intervals for DTRs has been extensively studied, prediction and tolerance intervals have received little attention. We begin by reviewing in detail different interval estimation and prediction methods and then adapting them to the DTR setting. We illustrate some of the challenges associated with tolerance interval estimation stemming from the fact that we do not typically have data that were generated from the estimated optimal regime. We give an extensive empirical evaluation of the methods and discussed several practical aspects of method choice, and we present an example application using data from a clinical trial. Finally, we discuss future directions within this important emerging area of DTR research.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (5 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)